Exploration of Raw Meshes: A Scientific Analysis

Authors

Nils Fahrni

Etienne Roulet

Published

March 28, 2025

Abstract

This report presents a systematic exploration of raw 3D mesh data. We outline the methodology used to visualize the raw meshes, compute key geometric properties (e.g., number of vertices, faces, edge lengths, and normal magnitudes), detect statistical outliers, and identify potential duplicates within the dataset. The findings provide insights into mesh quality, consistency, and potential anomalies, laying the groundwork for further data processing and analysis.

Introduction

The quality and consistency of raw 3D meshes are essential for applications in computer graphics, medical imaging, and computational geometry. In this work, we: - Visualize a subset of the raw meshes. - Compute key mesh properties that reflect resolution and geometric detail. - Identify outliers in the dataset using robust statistical methods. - Detect duplicate or highly similar meshes using normalized feature vectors.

Each experiment is detailed in the sections that follow, with code, methodology, and interpretation of the results.

Materials and Methods

We utilize Python libraries such as PyVista for mesh handling and visualization, NumPy for numerical operations, Matplotlib and Seaborn for plotting, and Joblib for parallel processing. Environmental variables are loaded using dotenv to manage data paths. The dataset comprises STL files located in a designated raw data directory.

The experiments are organized as follows: 1. Raw Mesh Visualization – A random sample of meshes is displayed. 2. Mesh Properties Analysis – Key geometric properties are computed and outliers are flagged. 3. Duplicate Detection – Meshes are compared based on normalized features to identify near-duplicates. 4. Extended Mesh Properties Analysis – A broader set of mesh characteristics is analyzed, with statistical summaries provided.

Experiment 1: Raw Mesh Visualization

In this section, a subset of raw meshes is loaded and rendered. A screenshot is taken for each mesh using an offscreen PyVista plotter, and the results are arranged in a grid.

Code
# Import necessary libraries and load environment variables
import os
import random
import pyvista as pv
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from dotenv import load_dotenv

load_dotenv()

# Define the directory containing raw STL meshes
RAW_MESHES_DIR = os.path.join(os.getenv('DATA_DIR_PATH'), 'raw')

# List all STL files and count them
stl_files = [file for file in os.listdir(RAW_MESHES_DIR) if file.endswith(".stl")]
print("Number of STL files found:", len(stl_files))
Number of STL files found: 207

Visualization Function

A function is defined to load a mesh, render it offscreen, and return a screenshot image.

Code
def get_mesh_screenshot(file_path, width=300, height=300):
    """
    Load a mesh and return a screenshot of it using PyVista in offscreen mode.
    
    Parameters:
        file_path (str): Path to the STL file.
        width (int): Width of the output image.
        height (int): Height of the output image.
        
    Returns:
        np.ndarray or None: The screenshot image as a numpy array, or None if the mesh is empty.
    """
    mesh = pv.read(file_path)
    
    if mesh.n_points == 0:
        print(f"Warning: Mesh at {file_path} is empty. Skipping.")
        return None
    
    plotter = pv.Plotter(off_screen=True, window_size=(width, height))
    plotter.add_mesh(mesh, color="white")
    plotter.camera_position = 'xy'
    plotter.background_color = 'black'
    
    img = plotter.screenshot(transparent_background=False)
    plotter.close()
    return img

Generating and Displaying Sample Mesh Screenshots

A sample of 25 meshes is randomly selected (if available) and their screenshots are plotted in a grid layout.

Code
sample_size = 25
sampled_files = random.sample(stl_files, sample_size) if len(stl_files) >= sample_size else stl_files

screenshots = []
for file in sampled_files:
    file_path = os.path.join(RAW_MESHES_DIR, file)
    img = get_mesh_screenshot(file_path)
    if img is not None:
        screenshots.append(img)

# Adjust grid dimensions based on the number of valid screenshots
if len(screenshots) < 25:
    print(f"Only {len(screenshots)} valid meshes found. Adjusting grid layout accordingly.")
    n_rows = n_cols = int(len(screenshots) ** 0.5) or 1
else:
    n_rows, n_cols = 5, 5

fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, 15))
axes = axes.flatten()

for i, ax in enumerate(axes):
    if i < len(screenshots):
        ax.imshow(screenshots[i])
    ax.axis("off")

plt.suptitle("Sampled Pollen Grain Meshes", fontsize=20)
plt.tight_layout()
plt.show()

Code
def load_mesh(file_path):
    """
    Load an STL file and return a PyVista mesh object.
    
    Parameters:
        file_path (str): Path to the STL file.
    
    Returns:
        pv.PolyData: The loaded mesh.
    """
    mesh = pv.read(file_path)
    return mesh

def visualize_mesh(mesh, notebook=True):
    """
    Visualize the provided PyVista mesh.
    
    Parameters:
        mesh (pv.PolyData): The mesh to visualize.
        notebook (bool): Whether to render in a Jupyter notebook environment.
    """
    mesh.plot(notebook=notebook)


if stl_files:
    first_file_path = os.path.join(RAW_MESHES_DIR, stl_files[206])
    mesh_data = load_mesh(first_file_path)
    
    visualize_mesh(mesh_data, notebook=True)
else:
    print("No STL files found in the specified folder.")

Experiment 2: Mesh Properties Analysis

In this section, we compute several key metrics for each mesh: - n_vertices: Total number of vertices. - n_faces: Total number of faces (cells). - avg_edge_length: Average length of the edges. - std_edge_length: Standard deviation of edge lengths. - avg_normal_magnitude: Average magnitude of point normals.

These metrics provide insight into the resolution and geometric complexity of each mesh.

Function Definitions

Functions are defined to load meshes, compute properties, and flag outliers using the Interquartile Range (IQR) method.

Code
import os
import numpy as np
import pyvista as pv
import matplotlib.pyplot as plt
import seaborn as sns
from joblib import Parallel, delayed
from tqdm import tqdm

def compute_mesh_properties(mesh):
    """
    Compute key properties of a mesh using vectorized operations.
    
    Assumes that the mesh is triangulated.
    
    Returns a dictionary with:
      - n_vertices: number of vertices in the mesh.
      - n_faces: number of faces (using n_cells).
      - avg_edge_length: mean length of all edges.
      - std_edge_length: standard deviation of the edge lengths.
      - avg_normal_magnitude: average magnitude of point normals (if available).
    """
    n_vertices = mesh.n_points
    n_faces = mesh.n_cells  # Using n_cells instead of deprecated n_faces

    pts = mesh.points

    try:
        faces = mesh.faces.reshape((-1, 4))[:, 1:4]
    except Exception as e:
        print("Error reshaping faces. Mesh may not be triangulated.")
        return None

    face_pts = pts[faces]

    edge1 = face_pts[:, 1] - face_pts[:, 0]
    edge2 = face_pts[:, 2] - face_pts[:, 1]
    edge3 = face_pts[:, 0] - face_pts[:, 2]

    lengths1 = np.linalg.norm(edge1, axis=1)
    lengths2 = np.linalg.norm(edge2, axis=1)
    lengths3 = np.linalg.norm(edge3, axis=1)

    all_lengths = np.concatenate([lengths1, lengths2, lengths3])
    avg_edge_length = np.mean(all_lengths)
    std_edge_length = np.std(all_lengths)

    if hasattr(mesh, 'point_normals') and mesh.point_normals is not None:
        normals = mesh.point_normals
        avg_normal_magnitude = np.mean(np.linalg.norm(normals, axis=1))
    else:
        avg_normal_magnitude = None

    return {
        'n_vertices': n_vertices,
        'n_faces': n_faces,
        'avg_edge_length': avg_edge_length,
        'std_edge_length': std_edge_length,
        'avg_normal_magnitude': avg_normal_magnitude
    }

def process_file(file_path):
    """
    Helper function to process a single file. Returns mesh properties.
    """
    try:
        mesh = pv.read(file_path)
    except Exception as e:
        print(f"Error reading {file_path}: {e}")
        return None
    if mesh.n_points == 0:
        print(f"Skipping empty mesh: {file_path}")
        return None
    props = compute_mesh_properties(mesh)
    return props

def flag_outliers(values):
    """
    Identify outliers based on the IQR method and sort them by how much they deviate from the threshold.
    
    For each value outside the acceptable range, the deviation is measured as:
      - lower deviation: lower_bound - value, if the value is below lower_bound.
      - upper deviation: value - upper_bound, if the value is above upper_bound.
    
    Returns a list of tuples (index, deviation) sorted in descending order of deviation.
    """
    values = np.array(values)
    Q1 = np.percentile(values, 25)
    Q3 = np.percentile(values, 75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outliers = []
    for i, v in enumerate(values):
        if v < lower_bound:
            deviation = lower_bound - v
            outliers.append((i, deviation))
        elif v > upper_bound:
            deviation = v - upper_bound
            outliers.append((i, deviation))
    # Sort outliers by deviation (largest deviation first)
    outliers_sorted = sorted(outliers, key=lambda x: x[1], reverse=True)
    return outliers_sorted

def analyze_dataset_properties(raw_meshes_dir):
    """
    Process all STL files in the specified directory in parallel using Joblib,
    compute mesh properties, print a summary of the mean and standard deviation
    for each property, and plot each metric on its own subplot with annotated bars.
    Returns the outlier indices for further inspection.
    """
    stl_files = [file for file in os.listdir(raw_meshes_dir) if file.endswith(".stl")]
    file_paths = [os.path.join(raw_meshes_dir, file) for file in stl_files]

    results = Parallel(n_jobs=-1)(
        delayed(process_file)(fp) for fp in tqdm(file_paths, desc="Processing meshes")
    )
    results = [r for r in results if r is not None]
    
    if not results:
        print("No valid meshes processed.")
        return

    # Collect each metric into separate lists
    vertices_list = [r['n_vertices'] for r in results]
    faces_list = [r['n_faces'] for r in results]
    edge_length_list = [r['avg_edge_length'] for r in results]
    edge_length_std_list = [r['std_edge_length'] for r in results]
    normal_mag_list = [r['avg_normal_magnitude'] for r in results if r['avg_normal_magnitude'] is not None]

    summary = {
        'n_vertices': (np.mean(vertices_list), np.std(vertices_list)),
        'n_faces': (np.mean(faces_list), np.std(faces_list)),
        'avg_edge_length': (np.mean(edge_length_list), np.std(edge_length_list)),
        'edge_length_std': (np.mean(edge_length_std_list), np.std(edge_length_std_list))
    }
    if normal_mag_list:
        summary['avg_normal_magnitude'] = (np.mean(normal_mag_list), np.std(normal_mag_list))
    
    print("\n--- Dataset Mesh Properties Summary (mean ± std) ---")
    for key, (mean_val, std_val) in summary.items():
        print(f" - {key}: {mean_val:.2f} ± {std_val:.2f}")
    
    # Compute outlier indices for each property (do not print them)
    outlier_indices = {}
    properties = {
        'n_vertices': vertices_list,
        'n_faces': faces_list,
        'avg_edge_length': edge_length_list,
        'edge_length_std': edge_length_std_list
    }
    if normal_mag_list:
        properties['avg_normal_magnitude'] = normal_mag_list

    for key, values in properties.items():
        sorted_outliers = flag_outliers(values)
        outlier_indices[key] = sorted_outliers
    
    return outlier_indices, {
    "vertices_list": vertices_list,
    "faces_list": faces_list,
    "edge_length_list": edge_length_list,
    "edge_length_std_list": edge_length_std_list,
    "normal_mag_list": normal_mag_list
    }

outliers, mesh_stats = analyze_dataset_properties(RAW_MESHES_DIR)
Processing meshes:   0%|          | 0/207 [00:00<?, ?it/s]Processing meshes:  31%|███       | 64/207 [00:00<00:01, 72.58it/s]Processing meshes:  46%|████▋     | 96/207 [00:02<00:02, 42.20it/s]Processing meshes:  62%|██████▏   | 128/207 [00:04<00:03, 25.36it/s]Processing meshes:  77%|███████▋  | 160/207 [00:05<00:01, 25.52it/s]Processing meshes:  93%|█████████▎| 192/207 [00:06<00:00, 25.41it/s]Processing meshes: 100%|██████████| 207/207 [00:06<00:00, 30.88it/s]

--- Dataset Mesh Properties Summary (mean ± std) ---
 - n_vertices: 269226.55 ± 216488.00
 - n_faces: 538485.99 ± 432855.65
 - avg_edge_length: 0.21 ± 0.10
 - edge_length_std: 0.09 ± 0.03
 - avg_normal_magnitude: 1.00 ± 0.00

Plotting Top Outlier Meshes

For further inspection, the top outliers based on the number of vertices are visualized. This helps identify meshes that deviate significantly from the norm.

Code
def plot_top_outliers(metric_outliers, top_n=5, offset_distance=200.0, raw_meshes_dir=RAW_MESHES_DIR):
    """
    Plot the top_n outlier meshes for a given metric.
    
    Parameters:
        metric_outliers (list of tuples): Each tuple is (index, deviation).
        top_n (int): Number of outlier meshes to plot.
        offset_distance (float): Distance offset along the x-axis.
        raw_meshes_dir (str): Directory containing STL files.
    """
    stl_files = [file for file in os.listdir(raw_meshes_dir) if file.endswith(".stl")]
    
    top_outliers = metric_outliers[:top_n]
    print(f"--- Plotting top {len(top_outliers)} outliers ---")
    for i, (idx, deviation) in enumerate(top_outliers):
        print(f" - {idx}: {stl_files[idx]} (deviation: {deviation:.2f})")
    
    plotter = pv.Plotter()
    
    for i, (idx, deviation) in enumerate(top_outliers):
        file_path = os.path.join(raw_meshes_dir, stl_files[idx])
        try:
            mesh = pv.read(file_path)
        except Exception as e:
            print(f"Error reading {file_path}: {e}")
            continue
        
        offset = np.array([i * offset_distance, 0, 0])
        mesh.translate(offset, inplace=True)
        plotter.add_mesh(mesh, color='white', opacity=0.8)
    
    plotter.show()

# Plot the top 5 outliers based on n_vertices.
plot_top_outliers(outliers['n_vertices'], top_n=5)
--- Plotting top 5 outliers ---
 - 66: 17900_Germinating_lily_Lilium_sp_pollen_grain.stl (deviation: 1234488.00)
 - 55: 17846_Common_fern_Polypodium_vulgare_spore.stl (deviation: 424338.00)
 - 44: 17826_Blue_passion_flower_Passiflora_caerulea_pollen_grain.stl (deviation: 293871.00)
 - 119: 20939_Western_hemlock_Tsuga_heterophylla_pollen_grain.stl (deviation: 115916.00)
 - 87: 20611_Cuckoo_flower_Cardamine_pratensis_pollen_grain.stl (deviation: 86168.00)

Experiment 3: Duplicate Mesh Detection

This experiment aims to identify potential duplicates or near-duplicates by comparing normalized feature vectors derived from mesh properties. The feature vector includes: - Number of vertices - Number of faces - Average edge length - Standard deviation of edge lengths

A Euclidean distance is computed between normalized features, and meshes with a distance below a set threshold are flagged as duplicates.

Code
from scipy.spatial.distance import pdist, squareform
from sklearn.preprocessing import StandardScaler

def detect_duplicate_meshes(raw_meshes_dir, threshold=0.05):
    """
    Detect duplicate meshes based on their normalized geometric properties.
    
    Parameters:
        raw_meshes_dir (str): Directory containing STL files.
        threshold (float): Euclidean distance threshold for duplicates.
    
    Returns:
        duplicates (dict): Dictionary mapping a filename to a list of duplicates.
    """
    stl_files = [file for file in os.listdir(raw_meshes_dir) if file.endswith(".stl")]
    file_paths = [os.path.join(raw_meshes_dir, file) for file in stl_files]
    
    results = Parallel(n_jobs=-1)(
        delayed(process_file)(fp) for fp in tqdm(file_paths, desc="Processing meshes for duplicates")
    )
    valid_indices = [i for i, r in enumerate(results) if r is not None]
    valid_files = [stl_files[i] for i in valid_indices]
    
    features = []
    for r in results:
        if r is not None:
            features.append([
                r['n_vertices'],
                r['n_faces'],
                r['avg_edge_length'],
                r['std_edge_length']
            ])
    features = np.array(features)
    
    scaler = StandardScaler()
    features_norm = scaler.fit_transform(features)
    dist_matrix = squareform(pdist(features_norm, metric='euclidean'))
    
    duplicates = {}
    n = len(valid_files)
    for i in range(n):
        dup_list = []
        for j in range(i + 1, n):
            if dist_matrix[i, j] < threshold:
                dup_list.append(valid_files[j])
        if dup_list:
            duplicates[valid_files[i]] = dup_list
    return duplicates

dups = detect_duplicate_meshes(RAW_MESHES_DIR, threshold=0.05)
print("Duplicate candidates found:")
for key, dup_list in dups.items():
    print(f"File: {key} duplicates: {dup_list}")
Processing meshes for duplicates:   0%|          | 0/207 [00:00<?, ?it/s]Processing meshes for duplicates:  31%|███       | 64/207 [00:00<00:00, 359.71it/s]Processing meshes for duplicates:  62%|██████▏   | 128/207 [00:01<00:00, 105.83it/s]Processing meshes for duplicates:  93%|█████████▎| 192/207 [00:03<00:00, 41.47it/s] Processing meshes for duplicates: 100%|██████████| 207/207 [00:03<00:00, 55.31it/s]
Duplicate candidates found:
File: 21256_German_knotweed_Scleranthus_annuus_pollen_grain_shrunken.stl duplicates: ['21271_White_campion_Silene_alba_pollen_grain_shrunken.stl']
File: 21264_European_goldenrod_Solidago_virgaurea_pollen_grain.stl duplicates: ['21266_Field_madder_Sherardia_arvensis_pollen_grain.stl']
File: 21376_Red_fescue_Festuca_rubra_pollen_grain_shrunken.stl duplicates: ['21377_Red_fescue_Festuca_rubra_pollen_grain_shrunken.stl']
File: 21378_Hemp-agrimony_Eupatorium_cannabinum_pollen_grain_shrunken.stl duplicates: ['21379_Hemp-agrimony_Eupatorium_cannabinum_pollen_grain_shrunken.stl']

Experiment 4: Mesh Quality Analysis

General Idea

  • Meshes with unusually high std_edge_length often contain sharp artifacts or disconnected regions.
  • Very low n_vertices and n_faces may indicate overly simplified or corrupted models.
  • Low avg_normal_magnitude values often suggest noisy or flat regions, potentially due to flattening or scanning artifacts.
Code
# Experiment 4: Mesh Quality Analysis – Visualizing Typical Defects

# Visualize meshes with very high edge length std (irregular surfaces)
print("Visualizing top meshes with high standard deviation of edge lengths:")
plot_top_outliers(outliers['edge_length_std'], top_n=5)

# Visualize meshes with very low vertex count (over-simplified or broken)
# Sort in ascending order to get those with lowest n_vertices
lowest_vertices = sorted(outliers['n_vertices'], key=lambda x: x[1], reverse=True)[-5:]
print("\nVisualizing meshes with very low vertex counts:")
plot_top_outliers(lowest_vertices, top_n=5)

# Visualize meshes with low normal magnitude (noisy or flattened shapes), if normals are available
if 'avg_normal_magnitude' in outliers:
    print("\nVisualizing meshes with low average normal magnitude:")
    low_normal_mags = sorted(outliers['avg_normal_magnitude'], key=lambda x: x[1], reverse=True)[-5:]
    plot_top_outliers(low_normal_mags, top_n=5)
else:
    print("\nNo valid normal vectors available for avg_normal_magnitude analysis.")
Visualizing top meshes with high standard deviation of edge lengths:
--- Plotting top 5 outliers ---
 - 12: 17787_Yellow_iris_Iris_pseudacorus_pollen_grain.stl (deviation: 0.30)
 - 57: 17879_Pumpkin_Cucurbita_pepo_pollen_grain.stl (deviation: 0.04)
 - 10: 17785_Hedge_bindweed_Calystegia_sepium_pollen_grain.stl (deviation: 0.04)
 - 21: 17796_Hardy_fuchsia_Fuchsia_magellanica_pollen_grain.stl (deviation: 0.02)
 - 81: 20605_Field_maple_Acer_campestre_pollen_grain.stl (deviation: 0.02)

Visualizing meshes with very low vertex counts:
--- Plotting top 5 outliers ---
 - 64: 17886_Common_wheat_Triticum_aestivan_pollen_grain.stl (deviation: 59653.00)
 - 50: 17833_European_white_water_lily_Nymphaea_alba_pollen_grain.stl (deviation: 29662.00)
 - 57: 17879_Pumpkin_Cucurbita_pepo_pollen_grain.stl (deviation: 27549.00)
 - 61: 17883_Evening_primrose_Oenothera_fruticosa_pollen_grain.stl (deviation: 20212.00)
 - 148: 21252_Pontic_rhododendron_Rhododendron_ponticum_pollen_grain.stl (deviation: 7516.00)

Visualizing meshes with low average normal magnitude:
--- Plotting top 5 outliers ---
 - 162: 21271_White_campion_Silene_alba_pollen_grain_shrunken.stl (deviation: 0.00)
 - 117: 20937_Small-leaved_lime_Tilia_cordata_pollen_grain.stl (deviation: 0.00)
 - 119: 20939_Western_hemlock_Tsuga_heterophylla_pollen_grain.stl (deviation: 0.00)
 - 116: 20936_Field_pennycress_Thlaspi_arvense_pollen_grain_2.stl (deviation: 0.00)
 - 138: 21145_Pine_Pinus_sp_pollen_grain.stl (deviation: 0.00)

Experiment 5: Surface Smoothness and Roughness

To further characterize the mesh geometry, we compute surface curvature-based metrics to quantify whether a pollen grain appears smooth (e.g., spherical, elliptical) or rough (e.g., with spikes or ridges).

We use mean curvature as a proxy: - Low curvature → smooth, flat or rounded surfaces - High curvature → sharp features or fine details (e.g., spikes)

This experiment helps categorize the dataset into morphological classes relevant for classification and reconstruction.

We visualize histograms of curvature distributions and sample meshes from both ends of the spectrum.

Code
def compute_curvature_metrics(mesh):
    """
    Compute curvature-based metrics from a mesh surface.
    
    Returns:
        mean_curv: Mean of mean curvature values
        std_curv: Standard deviation (roughness indicator)
    """
    try:
        curvatures = mesh.curvature(curv_type='mean')
    except Exception as e:
        print("Curvature computation failed:", e)
        return None

    mean_curv = np.mean(np.abs(curvatures))  # take abs to avoid cancellation
    std_curv = np.std(curvatures)

    return mean_curv, std_curv
Code
curvature_results = []
curved_file_paths = []

for file in tqdm(stl_files, desc="Computing curvature metrics"):
    file_path = os.path.join(RAW_MESHES_DIR, file)
    try:
        mesh = pv.read(file_path)
        if mesh.n_points == 0:
            continue
        m, s = compute_curvature_metrics(mesh)
        if m is not None:
            curvature_results.append((m, s))
            curved_file_paths.append(file_path)
    except Exception as e:
        print("Error:", e)

mean_curvs, std_curvs = zip(*curvature_results)
Computing curvature metrics:   0%|          | 0/207 [00:00<?, ?it/s]Computing curvature metrics:   0%|          | 1/207 [00:00<00:21,  9.51it/s]Computing curvature metrics:   1%|▏         | 3/207 [00:00<00:15, 13.49it/s]Computing curvature metrics:   2%|▏         | 5/207 [00:00<00:16, 12.62it/s]Computing curvature metrics:   3%|▎         | 7/207 [00:00<00:16, 12.24it/s]Computing curvature metrics:   4%|▍         | 9/207 [00:00<00:16, 12.19it/s]Computing curvature metrics:   5%|▌         | 11/207 [00:00<00:16, 11.68it/s]Computing curvature metrics:   6%|▋         | 13/207 [00:01<00:16, 11.65it/s]Computing curvature metrics:   7%|▋         | 15/207 [00:01<00:16, 11.62it/s]Computing curvature metrics:   8%|▊         | 17/207 [00:01<00:16, 11.28it/s]Computing curvature metrics:   9%|▉         | 19/207 [00:01<00:16, 11.23it/s]Computing curvature metrics:  10%|█         | 21/207 [00:01<00:16, 11.33it/s]Computing curvature metrics:  11%|█         | 23/207 [00:02<00:16, 10.91it/s]Computing curvature metrics:  12%|█▏        | 25/207 [00:02<00:20,  8.71it/s]Computing curvature metrics:  13%|█▎        | 27/207 [00:02<00:19,  9.07it/s]Computing curvature metrics:  14%|█▎        | 28/207 [00:02<00:19,  9.18it/s]Computing curvature metrics:  14%|█▍        | 30/207 [00:02<00:21,  8.21it/s]Computing curvature metrics:  15%|█▍        | 31/207 [00:03<00:24,  7.12it/s]Computing curvature metrics:  15%|█▌        | 32/207 [00:03<00:28,  6.20it/s]Computing curvature metrics:  16%|█▌        | 33/207 [00:03<00:32,  5.42it/s]Computing curvature metrics:  16%|█▋        | 34/207 [00:03<00:36,  4.72it/s]Computing curvature metrics:  17%|█▋        | 35/207 [00:04<00:40,  4.26it/s]Computing curvature metrics:  17%|█▋        | 36/207 [00:04<00:42,  3.98it/s]Computing curvature metrics:  18%|█▊        | 37/207 [00:04<00:45,  3.74it/s]Computing curvature metrics:  18%|█▊        | 38/207 [00:05<00:47,  3.52it/s]Computing curvature metrics:  19%|█▉        | 39/207 [00:05<00:41,  4.06it/s]Computing curvature metrics:  19%|█▉        | 40/207 [00:05<00:45,  3.70it/s]Computing curvature metrics:  20%|█▉        | 41/207 [00:05<00:48,  3.42it/s]Computing curvature metrics:  20%|██        | 42/207 [00:06<00:52,  3.16it/s]Computing curvature metrics:  21%|██        | 43/207 [00:06<00:55,  2.96it/s]Computing curvature metrics:  21%|██▏       | 44/207 [00:07<00:58,  2.78it/s]Computing curvature metrics:  22%|██▏       | 45/207 [00:08<01:29,  1.80it/s]Computing curvature metrics:  22%|██▏       | 46/207 [00:08<01:23,  1.92it/s]Computing curvature metrics:  23%|██▎       | 47/207 [00:09<01:23,  1.92it/s]Computing curvature metrics:  23%|██▎       | 48/207 [00:09<01:23,  1.90it/s]Computing curvature metrics:  24%|██▎       | 49/207 [00:10<01:29,  1.77it/s]Computing curvature metrics:  24%|██▍       | 50/207 [00:10<01:32,  1.70it/s]Computing curvature metrics:  25%|██▍       | 51/207 [00:11<01:41,  1.54it/s]Computing curvature metrics:  25%|██▌       | 52/207 [00:12<01:46,  1.46it/s]Computing curvature metrics:  26%|██▌       | 53/207 [00:12<01:33,  1.66it/s]Computing curvature metrics:  26%|██▌       | 54/207 [00:13<01:32,  1.65it/s]Computing curvature metrics:  27%|██▋       | 55/207 [00:14<01:26,  1.77it/s]Computing curvature metrics:  27%|██▋       | 56/207 [00:15<01:52,  1.34it/s]Computing curvature metrics:  28%|██▊       | 57/207 [00:15<01:24,  1.78it/s]Computing curvature metrics:  28%|██▊       | 58/207 [00:16<01:32,  1.62it/s]Computing curvature metrics:  29%|██▊       | 59/207 [00:16<01:19,  1.87it/s]Computing curvature metrics:  29%|██▉       | 60/207 [00:16<01:12,  2.04it/s]Computing curvature metrics:  29%|██▉       | 61/207 [00:17<01:10,  2.07it/s]Computing curvature metrics:  30%|██▉       | 62/207 [00:18<01:20,  1.79it/s]Computing curvature metrics:  30%|███       | 63/207 [00:18<01:11,  2.01it/s]Computing curvature metrics:  31%|███       | 64/207 [00:18<00:57,  2.50it/s]Computing curvature metrics:  31%|███▏      | 65/207 [00:19<01:10,  2.01it/s]Computing curvature metrics:  32%|███▏      | 66/207 [00:19<00:57,  2.45it/s]Computing curvature metrics:  32%|███▏      | 67/207 [00:21<02:19,  1.00it/s]Computing curvature metrics:  33%|███▎      | 68/207 [00:22<01:49,  1.26it/s]Computing curvature metrics:  33%|███▎      | 69/207 [00:22<01:24,  1.63it/s]Computing curvature metrics:  34%|███▍      | 70/207 [00:22<01:03,  2.16it/s]Computing curvature metrics:  34%|███▍      | 71/207 [00:22<00:56,  2.41it/s]Computing curvature metrics:  35%|███▍      | 72/207 [00:23<00:52,  2.58it/s]Computing curvature metrics:  35%|███▌      | 73/207 [00:23<00:45,  2.92it/s]Computing curvature metrics:  36%|███▌      | 74/207 [00:23<00:35,  3.70it/s]Computing curvature metrics:  37%|███▋      | 76/207 [00:23<00:31,  4.22it/s]Computing curvature metrics:  37%|███▋      | 77/207 [00:24<00:34,  3.80it/s]Computing curvature metrics:  38%|███▊      | 78/207 [00:24<00:33,  3.90it/s]Computing curvature metrics:  38%|███▊      | 79/207 [00:24<00:36,  3.49it/s]Computing curvature metrics:  39%|███▊      | 80/207 [00:25<00:51,  2.47it/s]Computing curvature metrics:  39%|███▉      | 81/207 [00:25<00:43,  2.87it/s]Computing curvature metrics:  40%|███▉      | 82/207 [00:26<00:47,  2.64it/s]Computing curvature metrics:  40%|████      | 83/207 [00:26<00:43,  2.88it/s]Computing curvature metrics:  41%|████      | 84/207 [00:26<00:44,  2.75it/s]Computing curvature metrics:  41%|████      | 85/207 [00:27<00:40,  2.98it/s]Computing curvature metrics:  42%|████▏     | 86/207 [00:27<00:38,  3.16it/s]Computing curvature metrics:  42%|████▏     | 87/207 [00:27<00:35,  3.34it/s]Computing curvature metrics:  43%|████▎     | 88/207 [00:28<00:53,  2.23it/s]Computing curvature metrics:  43%|████▎     | 89/207 [00:28<00:52,  2.26it/s]Computing curvature metrics:  43%|████▎     | 90/207 [00:29<00:42,  2.78it/s]Computing curvature metrics:  44%|████▍     | 91/207 [00:29<00:36,  3.17it/s]Computing curvature metrics:  44%|████▍     | 92/207 [00:29<00:31,  3.66it/s]Computing curvature metrics:  45%|████▍     | 93/207 [00:29<00:28,  3.99it/s]Computing curvature metrics:  45%|████▌     | 94/207 [00:29<00:26,  4.30it/s]Computing curvature metrics:  46%|████▌     | 95/207 [00:29<00:22,  5.03it/s]Computing curvature metrics:  46%|████▋     | 96/207 [00:30<00:18,  5.86it/s]Computing curvature metrics:  47%|████▋     | 97/207 [00:30<00:16,  6.48it/s]Computing curvature metrics:  48%|████▊     | 99/207 [00:30<00:17,  6.25it/s]Computing curvature metrics:  48%|████▊     | 100/207 [00:30<00:17,  6.19it/s]Computing curvature metrics:  49%|████▉     | 101/207 [00:30<00:17,  6.04it/s]Computing curvature metrics:  49%|████▉     | 102/207 [00:30<00:16,  6.43it/s]Computing curvature metrics:  50%|████▉     | 103/207 [00:31<00:18,  5.61it/s]Computing curvature metrics:  51%|█████     | 105/207 [00:31<00:14,  7.27it/s]Computing curvature metrics:  51%|█████     | 106/207 [00:31<00:13,  7.35it/s]Computing curvature metrics:  52%|█████▏    | 107/207 [00:31<00:12,  7.72it/s]Computing curvature metrics:  52%|█████▏    | 108/207 [00:31<00:12,  7.64it/s]Computing curvature metrics:  53%|█████▎    | 109/207 [00:31<00:13,  7.32it/s]Computing curvature metrics:  53%|█████▎    | 110/207 [00:32<00:13,  7.15it/s]Computing curvature metrics:  54%|█████▎    | 111/207 [00:32<00:14,  6.67it/s]Computing curvature metrics:  54%|█████▍    | 112/207 [00:32<00:14,  6.64it/s]Computing curvature metrics:  55%|█████▍    | 113/207 [00:32<00:13,  7.18it/s]Computing curvature metrics:  56%|█████▌    | 115/207 [00:32<00:13,  7.03it/s]Computing curvature metrics:  56%|█████▌    | 116/207 [00:32<00:13,  6.71it/s]Computing curvature metrics:  57%|█████▋    | 117/207 [00:33<00:13,  6.55it/s]Computing curvature metrics:  57%|█████▋    | 118/207 [00:33<00:19,  4.67it/s]Computing curvature metrics:  57%|█████▋    | 119/207 [00:33<00:16,  5.25it/s]Computing curvature metrics:  58%|█████▊    | 120/207 [00:34<00:31,  2.79it/s]Computing curvature metrics:  58%|█████▊    | 121/207 [00:34<00:33,  2.60it/s]Computing curvature metrics:  59%|█████▉    | 122/207 [00:35<00:35,  2.36it/s]Computing curvature metrics:  59%|█████▉    | 123/207 [00:35<00:34,  2.44it/s]Computing curvature metrics:  60%|█████▉    | 124/207 [00:36<00:31,  2.65it/s]Computing curvature metrics:  60%|██████    | 125/207 [00:36<00:36,  2.24it/s]Computing curvature metrics:  61%|██████    | 126/207 [00:36<00:29,  2.73it/s]Computing curvature metrics:  61%|██████▏   | 127/207 [00:36<00:24,  3.27it/s]Computing curvature metrics:  62%|██████▏   | 128/207 [00:37<00:19,  3.98it/s]Computing curvature metrics:  62%|██████▏   | 129/207 [00:37<00:19,  4.05it/s]Computing curvature metrics:  63%|██████▎   | 130/207 [00:37<00:16,  4.56it/s]Computing curvature metrics:  63%|██████▎   | 131/207 [00:37<00:14,  5.18it/s]Computing curvature metrics:  64%|██████▍   | 132/207 [00:37<00:14,  5.08it/s]Computing curvature metrics:  64%|██████▍   | 133/207 [00:38<00:16,  4.54it/s]Computing curvature metrics:  65%|██████▍   | 134/207 [00:38<00:13,  5.25it/s]Computing curvature metrics:  65%|██████▌   | 135/207 [00:38<00:15,  4.58it/s]Computing curvature metrics:  66%|██████▌   | 136/207 [00:38<00:13,  5.31it/s]Computing curvature metrics:  66%|██████▌   | 137/207 [00:38<00:16,  4.23it/s]Computing curvature metrics:  67%|██████▋   | 138/207 [00:39<00:15,  4.34it/s]Computing curvature metrics:  67%|██████▋   | 139/207 [00:39<00:21,  3.22it/s]Computing curvature metrics:  68%|██████▊   | 140/207 [00:39<00:18,  3.66it/s]Computing curvature metrics:  69%|██████▊   | 142/207 [00:40<00:12,  5.01it/s]Computing curvature metrics:  69%|██████▉   | 143/207 [00:40<00:11,  5.59it/s]Computing curvature metrics:  70%|██████▉   | 144/207 [00:40<00:13,  4.63it/s]Computing curvature metrics:  70%|███████   | 145/207 [00:40<00:15,  4.00it/s]Computing curvature metrics:  71%|███████   | 146/207 [00:41<00:15,  4.02it/s]Computing curvature metrics:  71%|███████   | 147/207 [00:41<00:12,  4.80it/s]Computing curvature metrics:  71%|███████▏  | 148/207 [00:41<00:13,  4.53it/s]Computing curvature metrics:  72%|███████▏  | 149/207 [00:42<00:19,  2.93it/s]Computing curvature metrics:  72%|███████▏  | 150/207 [00:42<00:16,  3.37it/s]Computing curvature metrics:  73%|███████▎  | 151/207 [00:42<00:17,  3.15it/s]Computing curvature metrics:  73%|███████▎  | 152/207 [00:42<00:14,  3.87it/s]Computing curvature metrics:  74%|███████▍  | 153/207 [00:43<00:14,  3.62it/s]Computing curvature metrics:  74%|███████▍  | 154/207 [00:43<00:14,  3.71it/s]Computing curvature metrics:  75%|███████▍  | 155/207 [00:43<00:16,  3.15it/s]Computing curvature metrics:  75%|███████▌  | 156/207 [00:43<00:13,  3.71it/s]Computing curvature metrics:  76%|███████▌  | 157/207 [00:44<00:11,  4.33it/s]Computing curvature metrics:  76%|███████▋  | 158/207 [00:44<00:10,  4.76it/s]Computing curvature metrics:  77%|███████▋  | 159/207 [00:44<00:15,  3.15it/s]Computing curvature metrics:  78%|███████▊  | 161/207 [00:45<00:10,  4.29it/s]Computing curvature metrics:  78%|███████▊  | 162/207 [00:45<00:09,  4.69it/s]Computing curvature metrics:  79%|███████▊  | 163/207 [00:45<00:10,  4.11it/s]Computing curvature metrics:  79%|███████▉  | 164/207 [00:45<00:09,  4.36it/s]Computing curvature metrics:  80%|███████▉  | 165/207 [00:45<00:08,  4.85it/s]Computing curvature metrics:  80%|████████  | 166/207 [00:46<00:09,  4.49it/s]Computing curvature metrics:  81%|████████  | 168/207 [00:46<00:08,  4.79it/s]Computing curvature metrics:  82%|████████▏ | 169/207 [00:46<00:07,  4.79it/s]Computing curvature metrics:  82%|████████▏ | 170/207 [00:46<00:07,  4.70it/s]Computing curvature metrics:  83%|████████▎ | 171/207 [00:47<00:07,  4.90it/s]Computing curvature metrics:  83%|████████▎ | 172/207 [00:47<00:06,  5.05it/s]Computing curvature metrics:  84%|████████▎ | 173/207 [00:48<00:12,  2.78it/s]Computing curvature metrics:  84%|████████▍ | 174/207 [00:48<00:09,  3.38it/s]Computing curvature metrics:  85%|████████▍ | 175/207 [00:48<00:11,  2.84it/s]Computing curvature metrics:  85%|████████▌ | 176/207 [00:48<00:09,  3.27it/s]Computing curvature metrics:  86%|████████▌ | 177/207 [00:49<00:08,  3.71it/s]Computing curvature metrics:  86%|████████▌ | 178/207 [00:49<00:08,  3.45it/s]Computing curvature metrics:  86%|████████▋ | 179/207 [00:49<00:07,  3.53it/s]Computing curvature metrics:  87%|████████▋ | 180/207 [00:49<00:07,  3.81it/s]Computing curvature metrics:  87%|████████▋ | 181/207 [00:50<00:06,  3.97it/s]Computing curvature metrics:  88%|████████▊ | 182/207 [00:50<00:05,  4.64it/s]Computing curvature metrics:  88%|████████▊ | 183/207 [00:50<00:04,  4.92it/s]Computing curvature metrics:  89%|████████▉ | 184/207 [00:50<00:06,  3.56it/s]Computing curvature metrics:  89%|████████▉ | 185/207 [00:51<00:06,  3.60it/s]Computing curvature metrics:  90%|████████▉ | 186/207 [00:51<00:06,  3.39it/s]Computing curvature metrics:  91%|█████████ | 188/207 [00:51<00:04,  4.05it/s]Computing curvature metrics:  91%|█████████▏| 189/207 [00:52<00:04,  4.26it/s]Computing curvature metrics:  92%|█████████▏| 190/207 [00:52<00:05,  3.02it/s]Computing curvature metrics:  92%|█████████▏| 191/207 [00:52<00:04,  3.58it/s]Computing curvature metrics:  93%|█████████▎| 193/207 [00:53<00:02,  4.77it/s]Computing curvature metrics:  94%|█████████▎| 194/207 [00:53<00:02,  4.88it/s]Computing curvature metrics:  94%|█████████▍| 195/207 [00:53<00:02,  4.24it/s]Computing curvature metrics:  95%|█████████▍| 196/207 [00:53<00:02,  4.24it/s]Computing curvature metrics:  95%|█████████▌| 197/207 [00:54<00:02,  3.53it/s]Computing curvature metrics:  96%|█████████▌| 199/207 [00:54<00:01,  4.54it/s]Computing curvature metrics:  97%|█████████▋| 200/207 [00:54<00:01,  4.70it/s]Computing curvature metrics:  98%|█████████▊| 202/207 [00:55<00:00,  5.54it/s]Computing curvature metrics:  98%|█████████▊| 203/207 [00:55<00:00,  5.88it/s]Computing curvature metrics:  99%|█████████▊| 204/207 [00:55<00:00,  4.72it/s]Computing curvature metrics:  99%|█████████▉| 205/207 [00:55<00:00,  3.92it/s]Computing curvature metrics: 100%|█████████▉| 206/207 [00:56<00:00,  2.83it/s]Computing curvature metrics: 100%|██████████| 207/207 [00:56<00:00,  3.52it/s]Computing curvature metrics: 100%|██████████| 207/207 [00:56<00:00,  3.66it/s]
Code
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.histplot(mean_curvs, bins=30, kde=True)
plt.title("Distribution of Mean Curvature (Smoothness)")
plt.xlabel("Mean curvature")

plt.subplot(1, 2, 2)
sns.histplot(std_curvs, bins=30, kde=True)
plt.title("Distribution of Curvature Std (Roughness)")
plt.xlabel("Curvature standard deviation")

plt.tight_layout()
plt.show()

Code
# Top 3 smoothest and roughest meshes
smoothest = np.argsort(mean_curvs)[:3]
roughest = np.argsort(std_curvs)[-3:]

def show_mesh_group(indices, title):
    plotter = pv.Plotter(shape=(1, len(indices)))
    for i, idx in enumerate(indices):
        mesh = pv.read(curved_file_paths[idx])
        plotter.subplot(0, i)
        plotter.add_mesh(mesh, color="white")
        plotter.camera_position = 'xy'
    plotter.show(title=title)

show_mesh_group(smoothest, "Smoothest Meshes (Low Mean Curvature)")
show_mesh_group(roughest, "Roughest Meshes (High Curvature Std)")

Experiment 6: Shape Classification – Spherical vs. Non-Spherical Pollen Grains

General Idea

In this experiment, we classify each pollen mesh based on its geometric shape using the axis ratios of its bounding box. We aim to distinguish between general classes such as:

  • Spherical – nearly equal extent in all dimensions
  • Ellipsoidal – one dominant axis, but still compact
  • Elongated / Rod-like – one axis significantly longer
  • Flattened / Disc-like – one axis significantly shorter
  • Irregular – no clear symmetry

These shape classes are helpful for morphological categorization, clustering, and potentially guiding reconstruction models.

The classification is based on the ratios of the bounding box dimensions (X, Y, Z), normalized by the largest dimension.

Code
def classify_shape(mesh, tolerance=0.15):
    """
    Classify mesh into simple geometric shape based on bounding box dimensions.
    
    Returns:
        str: one of ['spherical', 'ellipsoidal', 'elongated', 'flattened', 'irregular']
    """
    bounds = mesh.bounds  # (xmin, xmax, ymin, ymax, zmin, zmax)
    dims = np.array([
        bounds[1] - bounds[0],
        bounds[3] - bounds[2],
        bounds[5] - bounds[4],
    ])
    dims_sorted = np.sort(dims)
    ratios = dims_sorted / np.max(dims_sorted)

    # Heuristics:
    if np.all(np.abs(ratios - 1.0) < tolerance):
        return "spherical"
    elif ratios[2] > 0.8 and ratios[0] > 0.6:
        return "ellipsoidal"
    elif ratios[2] > 0.9 and ratios[0] < 0.5:
        return "elongated"
    elif ratios[0] < 0.3 and ratios[2] < 0.8:
        return "flattened"
    else:
        return "irregular"
Code
shape_labels = []
shape_file_paths = []

for file in tqdm(stl_files, desc="Classifying mesh shapes"):
    file_path = os.path.join(RAW_MESHES_DIR, file)
    try:
        mesh = pv.read(file_path)
        if mesh.n_points == 0:
            continue
        shape = classify_shape(mesh)
        shape_labels.append(shape)
        shape_file_paths.append(file_path)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
Classifying mesh shapes:   0%|          | 0/207 [00:00<?, ?it/s]Classifying mesh shapes:   2%|▏         | 4/207 [00:00<00:06, 31.08it/s]Classifying mesh shapes:   4%|▍         | 8/207 [00:00<00:06, 31.09it/s]Classifying mesh shapes:   6%|▌         | 12/207 [00:00<00:06, 30.44it/s]Classifying mesh shapes:   8%|▊         | 16/207 [00:00<00:06, 29.57it/s]Classifying mesh shapes:   9%|▉         | 19/207 [00:00<00:06, 28.47it/s]Classifying mesh shapes:  11%|█         | 22/207 [00:00<00:06, 27.44it/s]Classifying mesh shapes:  12%|█▏        | 25/207 [00:00<00:08, 22.27it/s]Classifying mesh shapes:  14%|█▎        | 28/207 [00:01<00:07, 23.23it/s]Classifying mesh shapes:  15%|█▍        | 31/207 [00:01<00:09, 18.11it/s]Classifying mesh shapes:  16%|█▋        | 34/207 [00:01<00:12, 13.78it/s]Classifying mesh shapes:  17%|█▋        | 36/207 [00:01<00:14, 11.74it/s]Classifying mesh shapes:  18%|█▊        | 38/207 [00:02<00:16, 10.15it/s]Classifying mesh shapes:  19%|█▉        | 40/207 [00:02<00:16,  9.98it/s]Classifying mesh shapes:  20%|██        | 42/207 [00:02<00:19,  8.52it/s]Classifying mesh shapes:  21%|██        | 43/207 [00:02<00:20,  7.90it/s]Classifying mesh shapes:  21%|██▏       | 44/207 [00:03<00:22,  7.32it/s]Classifying mesh shapes:  22%|██▏       | 45/207 [00:03<00:29,  5.41it/s]Classifying mesh shapes:  22%|██▏       | 46/207 [00:03<00:30,  5.29it/s]Classifying mesh shapes:  23%|██▎       | 47/207 [00:03<00:31,  5.08it/s]Classifying mesh shapes:  23%|██▎       | 48/207 [00:04<00:33,  4.76it/s]Classifying mesh shapes:  24%|██▎       | 49/207 [00:04<00:36,  4.27it/s]Classifying mesh shapes:  24%|██▍       | 50/207 [00:04<00:40,  3.92it/s]Classifying mesh shapes:  25%|██▍       | 51/207 [00:05<00:43,  3.62it/s]Classifying mesh shapes:  25%|██▌       | 52/207 [00:05<00:47,  3.29it/s]Classifying mesh shapes:  26%|██▌       | 53/207 [00:05<00:40,  3.78it/s]Classifying mesh shapes:  26%|██▌       | 54/207 [00:05<00:40,  3.81it/s]Classifying mesh shapes:  27%|██▋       | 55/207 [00:06<00:38,  3.95it/s]Classifying mesh shapes:  27%|██▋       | 56/207 [00:06<00:49,  3.03it/s]Classifying mesh shapes:  28%|██▊       | 58/207 [00:06<00:38,  3.89it/s]Classifying mesh shapes:  29%|██▊       | 59/207 [00:07<00:34,  4.32it/s]Classifying mesh shapes:  29%|██▉       | 60/207 [00:07<00:31,  4.63it/s]Classifying mesh shapes:  29%|██▉       | 61/207 [00:07<00:31,  4.71it/s]Classifying mesh shapes:  30%|██▉       | 62/207 [00:07<00:36,  3.99it/s]Classifying mesh shapes:  30%|███       | 63/207 [00:08<00:32,  4.42it/s]Classifying mesh shapes:  31%|███▏      | 65/207 [00:08<00:30,  4.63it/s]Classifying mesh shapes:  32%|███▏      | 67/207 [00:09<00:53,  2.62it/s]Classifying mesh shapes:  33%|███▎      | 68/207 [00:09<00:45,  3.04it/s]Classifying mesh shapes:  34%|███▍      | 70/207 [00:09<00:31,  4.34it/s]Classifying mesh shapes:  34%|███▍      | 71/207 [00:10<00:28,  4.81it/s]Classifying mesh shapes:  35%|███▍      | 72/207 [00:10<00:25,  5.24it/s]Classifying mesh shapes:  36%|███▌      | 74/207 [00:10<00:18,  7.08it/s]Classifying mesh shapes:  37%|███▋      | 76/207 [00:10<00:15,  8.58it/s]Classifying mesh shapes:  38%|███▊      | 78/207 [00:10<00:15,  8.43it/s]Classifying mesh shapes:  39%|███▊      | 80/207 [00:11<00:19,  6.44it/s]Classifying mesh shapes:  40%|███▉      | 82/207 [00:11<00:18,  6.60it/s]Classifying mesh shapes:  40%|████      | 83/207 [00:11<00:17,  6.95it/s]Classifying mesh shapes:  41%|████      | 84/207 [00:11<00:18,  6.60it/s]Classifying mesh shapes:  41%|████      | 85/207 [00:11<00:17,  7.05it/s]Classifying mesh shapes:  42%|████▏     | 86/207 [00:12<00:16,  7.33it/s]Classifying mesh shapes:  43%|████▎     | 88/207 [00:12<00:19,  6.17it/s]Classifying mesh shapes:  43%|████▎     | 89/207 [00:12<00:18,  6.26it/s]Classifying mesh shapes:  44%|████▍     | 91/207 [00:12<00:14,  8.08it/s]Classifying mesh shapes:  45%|████▍     | 93/207 [00:12<00:11,  9.59it/s]Classifying mesh shapes:  46%|████▌     | 95/207 [00:12<00:10, 10.94it/s]Classifying mesh shapes:  47%|████▋     | 97/207 [00:13<00:08, 12.71it/s]Classifying mesh shapes:  48%|████▊     | 99/207 [00:13<00:08, 12.87it/s]Classifying mesh shapes:  49%|████▉     | 101/207 [00:13<00:08, 12.47it/s]Classifying mesh shapes:  50%|████▉     | 103/207 [00:13<00:08, 11.88it/s]Classifying mesh shapes:  51%|█████     | 106/207 [00:13<00:07, 14.18it/s]Classifying mesh shapes:  52%|█████▏    | 108/207 [00:13<00:06, 15.41it/s]Classifying mesh shapes:  53%|█████▎    | 110/207 [00:13<00:06, 15.90it/s]Classifying mesh shapes:  54%|█████▍    | 112/207 [00:14<00:06, 15.50it/s]Classifying mesh shapes:  56%|█████▌    | 115/207 [00:14<00:05, 16.41it/s]Classifying mesh shapes:  57%|█████▋    | 117/207 [00:14<00:05, 15.82it/s]Classifying mesh shapes:  57%|█████▋    | 119/207 [00:14<00:06, 13.16it/s]Classifying mesh shapes:  58%|█████▊    | 121/207 [00:15<00:10,  7.95it/s]Classifying mesh shapes:  59%|█████▉    | 123/207 [00:15<00:11,  7.19it/s]Classifying mesh shapes:  60%|█████▉    | 124/207 [00:15<00:11,  7.42it/s]Classifying mesh shapes:  60%|██████    | 125/207 [00:15<00:12,  6.39it/s]Classifying mesh shapes:  61%|██████▏   | 127/207 [00:15<00:10,  7.95it/s]Classifying mesh shapes:  62%|██████▏   | 129/207 [00:16<00:08,  9.25it/s]Classifying mesh shapes:  63%|██████▎   | 131/207 [00:16<00:07, 10.85it/s]Classifying mesh shapes:  64%|██████▍   | 133/207 [00:16<00:06, 10.64it/s]Classifying mesh shapes:  65%|██████▌   | 135/207 [00:16<00:06, 11.07it/s]Classifying mesh shapes:  66%|██████▌   | 137/207 [00:16<00:06, 10.87it/s]Classifying mesh shapes:  67%|██████▋   | 139/207 [00:17<00:07,  9.25it/s]Classifying mesh shapes:  68%|██████▊   | 141/207 [00:17<00:06, 10.87it/s]Classifying mesh shapes:  69%|██████▉   | 143/207 [00:17<00:05, 12.19it/s]Classifying mesh shapes:  70%|███████   | 145/207 [00:17<00:06, 10.12it/s]Classifying mesh shapes:  71%|███████   | 147/207 [00:17<00:05, 10.78it/s]Classifying mesh shapes:  72%|███████▏  | 149/207 [00:18<00:06,  8.36it/s]Classifying mesh shapes:  73%|███████▎  | 151/207 [00:18<00:06,  8.59it/s]Classifying mesh shapes:  74%|███████▍  | 153/207 [00:18<00:05,  9.31it/s]Classifying mesh shapes:  75%|███████▍  | 155/207 [00:18<00:06,  8.65it/s]Classifying mesh shapes:  76%|███████▌  | 157/207 [00:18<00:05,  9.93it/s]Classifying mesh shapes:  77%|███████▋  | 159/207 [00:19<00:05,  8.59it/s]Classifying mesh shapes:  78%|███████▊  | 161/207 [00:19<00:04, 10.05it/s]Classifying mesh shapes:  79%|███████▊  | 163/207 [00:19<00:04, 10.16it/s]Classifying mesh shapes:  80%|███████▉  | 165/207 [00:19<00:03, 11.14it/s]Classifying mesh shapes:  81%|████████  | 167/207 [00:19<00:03, 11.87it/s]Classifying mesh shapes:  82%|████████▏ | 169/207 [00:20<00:03, 10.92it/s]Classifying mesh shapes:  83%|████████▎ | 171/207 [00:20<00:03, 11.37it/s]Classifying mesh shapes:  84%|████████▎ | 173/207 [00:20<00:03,  8.64it/s]Classifying mesh shapes:  85%|████████▍ | 175/207 [00:20<00:03,  8.48it/s]Classifying mesh shapes:  86%|████████▌ | 177/207 [00:20<00:03,  9.48it/s]Classifying mesh shapes:  86%|████████▋ | 179/207 [00:21<00:03,  9.13it/s]Classifying mesh shapes:  87%|████████▋ | 181/207 [00:21<00:02,  9.36it/s]Classifying mesh shapes:  88%|████████▊ | 183/207 [00:21<00:02, 10.50it/s]Classifying mesh shapes:  89%|████████▉ | 185/207 [00:21<00:02,  8.91it/s]Classifying mesh shapes:  90%|████████▉ | 186/207 [00:21<00:02,  8.51it/s]Classifying mesh shapes:  91%|█████████ | 188/207 [00:22<00:01,  9.67it/s]Classifying mesh shapes:  92%|█████████▏| 190/207 [00:22<00:02,  8.14it/s]Classifying mesh shapes:  93%|█████████▎| 193/207 [00:22<00:01, 10.71it/s]Classifying mesh shapes:  94%|█████████▍| 195/207 [00:22<00:01, 10.37it/s]Classifying mesh shapes:  95%|█████████▌| 197/207 [00:23<00:01,  9.30it/s]Classifying mesh shapes:  96%|█████████▌| 199/207 [00:23<00:00, 10.66it/s]Classifying mesh shapes:  98%|█████████▊| 202/207 [00:23<00:00, 12.34it/s]Classifying mesh shapes:  99%|█████████▊| 204/207 [00:23<00:00, 11.91it/s]Classifying mesh shapes: 100%|█████████▉| 206/207 [00:23<00:00,  8.87it/s]Classifying mesh shapes: 100%|██████████| 207/207 [00:23<00:00,  8.63it/s]
Code
from collections import Counter

shape_counts = Counter(shape_labels)
plt.figure(figsize=(8, 5))
sns.barplot(x=list(shape_counts.keys()), y=list(shape_counts.values()))
plt.title("Shape Classification of Pollen Meshes")
plt.ylabel("Number of Meshes")
plt.xlabel("Shape Class")
plt.xticks(rotation=15)
plt.tight_layout()
plt.show()

Code
def show_examples_for_shape(shape_name, n=3):
    matching = [fp for fp, lbl in zip(shape_file_paths, shape_labels) if lbl == shape_name]
    if not matching:
        print(f"No examples found for shape: {shape_name}")
        return
    sample = matching[:n]
    plotter = pv.Plotter(shape=(1, len(sample)))
    for i, path in enumerate(sample):
        mesh = pv.read(path)
        plotter.subplot(0, i)
        plotter.add_mesh(mesh, color="white")
    # Setze Kameraeinstellung nur einmal, falls das für alle Subplots gelten soll
    plotter.camera_position = 'xy'
    plotter.show(title=f"Examples of {shape_name} pollen")

# Show examples for a few shapes
for shape in ["spherical", "ellipsoidal", "elongated", "flattened", "irregular"]:
    show_examples_for_shape(shape)
No examples found for shape: flattened

More Robust shape classification

Definition and Implementation

The classification is based on the bounding box dimensions and PCA (Principal Component Analysis) of the mesh’s point cloud. The bounding box is computed using the mesh’s axis‐aligned bounding box, and PCA is performed on the mesh’s point cloud to obtain eigenvalues that represent the variance along each axis.

Let the mesh’s axis‐aligned bounding box have extents
\[ d_x = x_{\max} - x_{\min},\quad d_y = y_{\max} - y_{\min},\quad d_z = z_{\max} - z_{\min}. \]
Sort these so that
\[ d_{(1)} \le d_{(2)} \le d_{(3)}, \]
and define the normalized bounding box ratios as
\[ r_i = \frac{d_{(i)}}{d_{(3)}},\quad i=1,2,3 \quad (r_3=1). \]

Similarly, let PCA on the mesh’s point cloud yield eigenvalues
\[ \lambda_1 \le \lambda_2 \le \lambda_3, \]
and define the PCA ratios as
\[ p_i = \frac{\lambda_i}{\lambda_3},\quad i=1,2,3 \quad (p_3=1). \]

Using a tolerance (), the shape classes are defined as follows:

  • Spherical:
    \[ |r_i - 1| < \epsilon \quad \text{and} \quad |p_i - 1| < \epsilon \quad \text{for } i=1,2,3. \]

  • Elongated:
    \[ \min(r_1,p_1) < 0.5 \quad \text{and} \quad \max(r_2,p_2) > 0.7. \]

  • Flattened:
    \[ \min(r_1,p_1) < 0.3 \quad \text{and} \quad \max(r_2,p_2) < 0.8. \]

  • Ellipsoidal:
    \[ \max(r_1,p_1) > 0.6. \]

  • Irregular: Falls into none of the above classes.

Plots with minimal convex hulls are used to visualize the shape classification. The convex hull is computed using the inlier points to reduce the effect of noise. \[ H = \frac{A_{\text{hull}}}{V_{\text{hull}} + \varepsilon'}, \]

Analysis of the Robust Classification

Code
import os
import numpy as np
import pyvista as pv
from tqdm import tqdm
from scipy.spatial import ConvexHull

def classify_shape_robust(mesh, tolerance=0.15, use_convex_hull=True):
    """
    Classify a mesh into one of the following shape categories using a combination
    of bounding box analysis, PCA-based features, and optionally convex hull metrics:
    'spherical', 'ellipsoidal', 'elongated', 'flattened', or 'irregular'.

    The approach includes:
      - Bounding box analysis: sensitive to the overall mesh dimensions.
      - PCA-based analysis: yields rotation invariant eigenvalues that capture the
        point distribution.
      - Optional convex hull analysis: provides additional robustness by considering
        the volume-to-surface-area ratio, which can help detect extreme irregularities.

    Args:
        mesh (pyvista.PolyData): The mesh to classify.
        tolerance (float): Threshold for heuristic deviations.
        use_convex_hull (bool): Whether to incorporate convex hull-based features.

    Returns:
        str: The classified shape category.
    """
    # --- 0. Preliminary Check ---
    if mesh.n_points == 0:
        return "irregular"  # No points to analyze

    # --- 1. Bounding Box Analysis ---
    # Get bounds: (xmin, xmax, ymin, ymax, zmin, zmax)
    bounds = mesh.bounds
    bbox_dims = np.array([
        bounds[1] - bounds[0],
        bounds[3] - bounds[2],
        bounds[5] - bounds[4],
    ])
    max_dim = np.max(bbox_dims)
    if max_dim <= 0:
        return "irregular"

    # Sort dimensions and calculate ratios; ratios are in ascending order
    bbox_sorted = np.sort(bbox_dims)
    bbox_ratios = bbox_sorted / max_dim

    # --- 2. PCA-Based Analysis ---
    points = mesh.points

    # Robust outlier filtering: remove points that lie farther than 3 standard deviations
    center = np.mean(points, axis=0)
    distances = np.linalg.norm(points - center, axis=1)
    std_dist = np.std(distances)
    inliers = points[distances < (3 * std_dist)]
    if inliers.shape[0] < 3:  # Fallback in case too many points are filtered out
        inliers = points

    # Center the inlier points
    centered = inliers - np.mean(inliers, axis=0)
    cov = np.cov(centered, rowvar=False)
    # Compute the eigenvalues for a symmetric covariance matrix (sorted in ascending order)
    eigenvalues = np.linalg.eigvalsh(cov)
    eigenvalues = np.sort(np.maximum(eigenvalues, 0))  # Ensure non-negative values
    max_eigen = eigenvalues[-1]
    if max_eigen <= 0:
        return "irregular"
    pca_ratios = eigenvalues / max_eigen

    # --- 3. Optional: Convex Hull Analysis ---
    hull_ratio = None
    if use_convex_hull:
        try:
            # Calculate the convex hull using the inlier points to reduce the effect of noise
            hull = ConvexHull(inliers)
            hull_volume = hull.volume
            hull_area = hull.area
            eps = 1e-8  # small constant to avoid division by zero
            hull_ratio = hull_area / (hull_volume + eps)
        except Exception as e:
            hull_ratio = None

    # --- 4. Combined Classification ---
    # Spherical: Both bounding box and PCA ratios should be nearly equal.
    if np.all(np.abs(bbox_ratios - 1.0) < tolerance) and np.all(np.abs(pca_ratios - 1.0) < tolerance):
        return "spherical"
    
    # Elongated: One compressed dimension with the other dimensions relatively high.
    if (bbox_ratios[0] < 0.5 or pca_ratios[0] < 0.5) and (bbox_ratios[1] > 0.7 or pca_ratios[1] > 0.7):
        return "elongated"
    
    # Flattened: Two dimensions are significantly compressed.
    if (bbox_ratios[0] < 0.3 or pca_ratios[0] < 0.3) and (bbox_ratios[1] < 0.8 or pca_ratios[1] < 0.8):
        return "flattened"
    
    # Ellipsoidal: Moderate deviations from spherical form that are not extremely elongated or flattened.
    if (bbox_ratios[0] > 0.6 or pca_ratios[0] > 0.6):
        return "ellipsoidal"
    
    # Extra check using convex hull metrics:
    if hull_ratio is not None:
        # A very high area-to-volume ratio might indicate an irregular or noisy shape.
        if hull_ratio > 10:
            return "irregular"
    
    # Default classification if none of the conditions are met.
    return "irregular"
Code
# --- Classification of STL files ---
# Assumes 'stl_files' is a list of filenames and RAW_MESHES_DIR is the directory containing them.
meshes = []         # To store the loaded pyvista meshes
shape_labels = []   # To store the corresponding shape labels

for file in tqdm(stl_files, desc="Classifying mesh shapes"):
    file_path = os.path.join(RAW_MESHES_DIR, file)
    try:
        mesh = pv.read(file_path)
        # Skip meshes with no points.
        if mesh.n_points == 0:
            continue
        shape = classify_shape_robust(mesh)
        meshes.append(mesh)
        shape_labels.append(shape)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
Classifying mesh shapes:   0%|          | 0/207 [00:00<?, ?it/s]Classifying mesh shapes:   1%|▏         | 3/207 [00:00<00:09, 21.79it/s]Classifying mesh shapes:   3%|▎         | 6/207 [00:00<00:10, 19.42it/s]Classifying mesh shapes:   4%|▍         | 9/207 [00:00<00:09, 20.00it/s]Classifying mesh shapes:   6%|▌         | 12/207 [00:00<00:10, 18.31it/s]Classifying mesh shapes:   7%|▋         | 14/207 [00:00<00:11, 17.46it/s]Classifying mesh shapes:   8%|▊         | 16/207 [00:00<00:10, 18.08it/s]Classifying mesh shapes:   9%|▊         | 18/207 [00:01<00:11, 15.89it/s]Classifying mesh shapes:  10%|▉         | 20/207 [00:01<00:11, 16.13it/s]Classifying mesh shapes:  11%|█         | 23/207 [00:01<00:11, 16.22it/s]Classifying mesh shapes:  12%|█▏        | 25/207 [00:01<00:12, 14.78it/s]Classifying mesh shapes:  13%|█▎        | 27/207 [00:01<00:11, 15.38it/s]Classifying mesh shapes:  14%|█▍        | 30/207 [00:01<00:12, 14.38it/s]Classifying mesh shapes:  15%|█▌        | 32/207 [00:02<00:15, 11.45it/s]Classifying mesh shapes:  16%|█▋        | 34/207 [00:02<00:16, 10.24it/s]Classifying mesh shapes:  17%|█▋        | 36/207 [00:02<00:18,  9.01it/s]Classifying mesh shapes:  18%|█▊        | 37/207 [00:02<00:20,  8.32it/s]Classifying mesh shapes:  18%|█▊        | 38/207 [00:02<00:21,  7.89it/s]Classifying mesh shapes:  19%|█▉        | 39/207 [00:03<00:20,  8.15it/s]Classifying mesh shapes:  19%|█▉        | 40/207 [00:03<00:26,  6.42it/s]Classifying mesh shapes:  20%|█▉        | 41/207 [00:03<00:27,  6.11it/s]Classifying mesh shapes:  20%|██        | 42/207 [00:03<00:29,  5.56it/s]Classifying mesh shapes:  21%|██        | 43/207 [00:03<00:30,  5.35it/s]Classifying mesh shapes:  21%|██▏       | 44/207 [00:04<00:31,  5.13it/s]Classifying mesh shapes:  22%|██▏       | 45/207 [00:04<00:43,  3.75it/s]Classifying mesh shapes:  22%|██▏       | 46/207 [00:04<00:43,  3.70it/s]Classifying mesh shapes:  23%|██▎       | 47/207 [00:05<00:42,  3.80it/s]Classifying mesh shapes:  23%|██▎       | 48/207 [00:05<00:43,  3.69it/s]Classifying mesh shapes:  24%|██▎       | 49/207 [00:05<00:45,  3.47it/s]Classifying mesh shapes:  24%|██▍       | 50/207 [00:06<00:53,  2.91it/s]Classifying mesh shapes:  25%|██▍       | 51/207 [00:06<00:56,  2.77it/s]Classifying mesh shapes:  25%|██▌       | 52/207 [00:07<01:03,  2.43it/s]Classifying mesh shapes:  26%|██▌       | 53/207 [00:07<00:54,  2.83it/s]Classifying mesh shapes:  26%|██▌       | 54/207 [00:07<00:52,  2.91it/s]Classifying mesh shapes:  27%|██▋       | 55/207 [00:07<00:47,  3.19it/s]Classifying mesh shapes:  27%|██▋       | 56/207 [00:08<01:00,  2.50it/s]Classifying mesh shapes:  28%|██▊       | 58/207 [00:08<00:46,  3.21it/s]Classifying mesh shapes:  29%|██▊       | 59/207 [00:09<00:42,  3.51it/s]Classifying mesh shapes:  29%|██▉       | 60/207 [00:09<00:39,  3.68it/s]Classifying mesh shapes:  29%|██▉       | 61/207 [00:09<00:39,  3.71it/s]Classifying mesh shapes:  30%|██▉       | 62/207 [00:10<00:45,  3.18it/s]Classifying mesh shapes:  30%|███       | 63/207 [00:10<00:43,  3.31it/s]Classifying mesh shapes:  31%|███▏      | 65/207 [00:10<00:40,  3.52it/s]Classifying mesh shapes:  32%|███▏      | 67/207 [00:12<01:05,  2.12it/s]Classifying mesh shapes:  33%|███▎      | 68/207 [00:12<00:55,  2.50it/s]Classifying mesh shapes:  33%|███▎      | 69/207 [00:12<00:46,  2.99it/s]Classifying mesh shapes:  34%|███▍      | 71/207 [00:12<00:33,  4.12it/s]Classifying mesh shapes:  35%|███▍      | 72/207 [00:13<00:30,  4.48it/s]Classifying mesh shapes:  35%|███▌      | 73/207 [00:13<00:26,  5.12it/s]Classifying mesh shapes:  37%|███▋      | 76/207 [00:13<00:17,  7.55it/s]Classifying mesh shapes:  37%|███▋      | 77/207 [00:13<00:17,  7.29it/s]Classifying mesh shapes:  38%|███▊      | 78/207 [00:13<00:16,  7.72it/s]Classifying mesh shapes:  38%|███▊      | 79/207 [00:13<00:20,  6.27it/s]Classifying mesh shapes:  39%|███▊      | 80/207 [00:14<00:25,  4.92it/s]Classifying mesh shapes:  39%|███▉      | 81/207 [00:14<00:22,  5.69it/s]Classifying mesh shapes:  40%|███▉      | 82/207 [00:14<00:23,  5.41it/s]Classifying mesh shapes:  40%|████      | 83/207 [00:14<00:21,  5.88it/s]Classifying mesh shapes:  41%|████      | 84/207 [00:14<00:22,  5.45it/s]Classifying mesh shapes:  41%|████      | 85/207 [00:15<00:22,  5.35it/s]Classifying mesh shapes:  42%|████▏     | 86/207 [00:15<00:20,  5.95it/s]Classifying mesh shapes:  42%|████▏     | 87/207 [00:15<00:20,  5.90it/s]Classifying mesh shapes:  43%|████▎     | 88/207 [00:15<00:31,  3.81it/s]Classifying mesh shapes:  43%|████▎     | 89/207 [00:16<00:29,  3.96it/s]Classifying mesh shapes:  44%|████▍     | 91/207 [00:16<00:20,  5.68it/s]Classifying mesh shapes:  44%|████▍     | 92/207 [00:16<00:19,  5.86it/s]Classifying mesh shapes:  45%|████▌     | 94/207 [00:16<00:15,  7.17it/s]Classifying mesh shapes:  46%|████▋     | 96/207 [00:16<00:12,  8.77it/s]Classifying mesh shapes:  48%|████▊     | 99/207 [00:16<00:10, 10.69it/s]Classifying mesh shapes:  49%|████▉     | 101/207 [00:17<00:09, 11.12it/s]Classifying mesh shapes:  50%|████▉     | 103/207 [00:17<00:11,  8.90it/s]Classifying mesh shapes:  50%|█████     | 104/207 [00:17<00:11,  8.85it/s]Classifying mesh shapes:  52%|█████▏    | 107/207 [00:17<00:09, 10.80it/s]Classifying mesh shapes:  53%|█████▎    | 109/207 [00:17<00:08, 12.21it/s]Classifying mesh shapes:  54%|█████▎    | 111/207 [00:18<00:08, 11.71it/s]Classifying mesh shapes:  55%|█████▍    | 113/207 [00:18<00:07, 12.41it/s]Classifying mesh shapes:  56%|█████▌    | 115/207 [00:18<00:07, 12.13it/s]Classifying mesh shapes:  57%|█████▋    | 117/207 [00:18<00:07, 11.96it/s]Classifying mesh shapes:  57%|█████▋    | 119/207 [00:18<00:10,  8.12it/s]Classifying mesh shapes:  58%|█████▊    | 120/207 [00:19<00:14,  6.09it/s]Classifying mesh shapes:  58%|█████▊    | 121/207 [00:19<00:15,  5.62it/s]Classifying mesh shapes:  59%|█████▉    | 122/207 [00:19<00:17,  4.91it/s]Classifying mesh shapes:  59%|█████▉    | 123/207 [00:20<00:16,  5.13it/s]Classifying mesh shapes:  60%|█████▉    | 124/207 [00:20<00:17,  4.62it/s]Classifying mesh shapes:  60%|██████    | 125/207 [00:20<00:21,  3.79it/s]Classifying mesh shapes:  61%|██████    | 126/207 [00:20<00:18,  4.48it/s]Classifying mesh shapes:  61%|██████▏   | 127/207 [00:20<00:16,  4.96it/s]Classifying mesh shapes:  62%|██████▏   | 129/207 [00:21<00:11,  6.81it/s]Classifying mesh shapes:  63%|██████▎   | 131/207 [00:21<00:09,  8.23it/s]Classifying mesh shapes:  64%|██████▍   | 132/207 [00:21<00:09,  8.31it/s]Classifying mesh shapes:  64%|██████▍   | 133/207 [00:21<00:12,  5.83it/s]Classifying mesh shapes:  65%|██████▍   | 134/207 [00:21<00:12,  6.03it/s]Classifying mesh shapes:  65%|██████▌   | 135/207 [00:22<00:16,  4.37it/s]Classifying mesh shapes:  66%|██████▌   | 137/207 [00:22<00:13,  5.06it/s]Classifying mesh shapes:  67%|██████▋   | 138/207 [00:22<00:12,  5.61it/s]Classifying mesh shapes:  67%|██████▋   | 139/207 [00:22<00:12,  5.35it/s]Classifying mesh shapes:  68%|██████▊   | 141/207 [00:23<00:08,  7.35it/s]Classifying mesh shapes:  69%|██████▊   | 142/207 [00:23<00:08,  7.77it/s]Classifying mesh shapes:  70%|██████▉   | 144/207 [00:23<00:07,  8.36it/s]Classifying mesh shapes:  70%|███████   | 145/207 [00:23<00:07,  8.00it/s]Classifying mesh shapes:  71%|███████   | 146/207 [00:23<00:07,  8.12it/s]Classifying mesh shapes:  71%|███████   | 147/207 [00:23<00:07,  8.39it/s]Classifying mesh shapes:  71%|███████▏  | 148/207 [00:24<00:09,  6.54it/s]Classifying mesh shapes:  72%|███████▏  | 149/207 [00:24<00:11,  5.14it/s]Classifying mesh shapes:  73%|███████▎  | 151/207 [00:24<00:09,  6.03it/s]Classifying mesh shapes:  74%|███████▍  | 153/207 [00:24<00:07,  7.17it/s]Classifying mesh shapes:  74%|███████▍  | 154/207 [00:24<00:07,  7.40it/s]Classifying mesh shapes:  75%|███████▍  | 155/207 [00:25<00:07,  6.77it/s]Classifying mesh shapes:  76%|███████▌  | 157/207 [00:25<00:06,  8.21it/s]Classifying mesh shapes:  76%|███████▋  | 158/207 [00:25<00:06,  7.69it/s]Classifying mesh shapes:  77%|███████▋  | 159/207 [00:25<00:10,  4.48it/s]Classifying mesh shapes:  78%|███████▊  | 161/207 [00:26<00:07,  6.24it/s]Classifying mesh shapes:  78%|███████▊  | 162/207 [00:26<00:06,  6.58it/s]Classifying mesh shapes:  79%|███████▊  | 163/207 [00:26<00:06,  6.74it/s]Classifying mesh shapes:  80%|███████▉  | 165/207 [00:26<00:05,  7.71it/s]Classifying mesh shapes:  80%|████████  | 166/207 [00:26<00:05,  7.80it/s]Classifying mesh shapes:  81%|████████  | 168/207 [00:26<00:04,  8.87it/s]Classifying mesh shapes:  82%|████████▏ | 169/207 [00:26<00:04,  8.90it/s]Classifying mesh shapes:  82%|████████▏ | 170/207 [00:27<00:04,  8.84it/s]Classifying mesh shapes:  83%|████████▎ | 172/207 [00:27<00:03,  9.53it/s]Classifying mesh shapes:  84%|████████▎ | 173/207 [00:27<00:05,  6.13it/s]Classifying mesh shapes:  85%|████████▍ | 175/207 [00:27<00:05,  6.38it/s]Classifying mesh shapes:  85%|████████▌ | 176/207 [00:27<00:04,  6.87it/s]Classifying mesh shapes:  86%|████████▌ | 178/207 [00:28<00:04,  7.18it/s]Classifying mesh shapes:  86%|████████▋ | 179/207 [00:28<00:03,  7.16it/s]Classifying mesh shapes:  87%|████████▋ | 180/207 [00:28<00:03,  7.40it/s]Classifying mesh shapes:  87%|████████▋ | 181/207 [00:28<00:03,  7.90it/s]Classifying mesh shapes:  88%|████████▊ | 183/207 [00:28<00:02,  9.71it/s]Classifying mesh shapes:  89%|████████▉ | 185/207 [00:29<00:02,  7.71it/s]Classifying mesh shapes:  90%|████████▉ | 186/207 [00:29<00:02,  7.63it/s]Classifying mesh shapes:  91%|█████████ | 188/207 [00:29<00:02,  8.76it/s]Classifying mesh shapes:  92%|█████████▏| 190/207 [00:29<00:02,  7.36it/s]Classifying mesh shapes:  93%|█████████▎| 192/207 [00:29<00:01,  8.55it/s]Classifying mesh shapes:  93%|█████████▎| 193/207 [00:30<00:01,  8.23it/s]Classifying mesh shapes:  94%|█████████▎| 194/207 [00:30<00:01,  8.07it/s]Classifying mesh shapes:  94%|█████████▍| 195/207 [00:30<00:01,  6.77it/s]Classifying mesh shapes:  95%|█████████▌| 197/207 [00:30<00:01,  6.40it/s]Classifying mesh shapes:  96%|█████████▌| 199/207 [00:30<00:01,  7.38it/s]Classifying mesh shapes:  98%|█████████▊| 202/207 [00:31<00:00,  8.56it/s]Classifying mesh shapes:  99%|█████████▊| 204/207 [00:31<00:00,  8.60it/s]Classifying mesh shapes:  99%|█████████▉| 205/207 [00:31<00:00,  6.40it/s]Classifying mesh shapes: 100%|█████████▉| 206/207 [00:32<00:00,  5.20it/s]Classifying mesh shapes: 100%|██████████| 207/207 [00:32<00:00,  6.43it/s]

Shape Classification Results

Code
from collections import Counter

shape_counts = Counter(shape_labels)
plt.figure(figsize=(8, 5))
sns.barplot(x=list(shape_counts.keys()), y=list(shape_counts.values()))
plt.title("Shape Classification of Pollen Meshes")
plt.ylabel("Number of Meshes")
plt.xlabel("Shape Class")
plt.xticks(rotation=15)
plt.tight_layout()
plt.show()

Code
# --- 2D Projection Plot ---
from sklearn.decomposition import PCA
from scipy.spatial import ConvexHull

# This part projects each 3D mesh into 2D using PCA and draws its convex hull outline.
num_meshes = len(meshes)
cols = 5  # Adjust the number of columns as desired.
rows = num_meshes // cols + (1 if num_meshes % cols != 0 else 0)

fig, axes = plt.subplots(rows, cols, figsize=(3 * cols, 3 * rows))
axes = axes.flatten()

for i, (mesh, label) in enumerate(zip(meshes, shape_labels)):
    # Get the mesh's 3D points.
    points = mesh.points
    # Perform PCA to reduce dimensions from 3D to 2D.
    pca = PCA(n_components=2)
    points_2d = pca.fit_transform(points)
    
    ax = axes[i]
    # Scatter plot of the points (light blue for background points).
    ax.scatter(points_2d[:, 0], points_2d[:, 1], s=1, color='lightblue', alpha=0.5)
    
    # Compute and plot the convex hull to show the overall shape.
    if points_2d.shape[0] >= 3:
        hull = ConvexHull(points_2d)
        hull_points = points_2d[hull.vertices]
        # Close the hull polygon.
        hull_points = np.concatenate([hull_points, hull_points[0:1]], axis=0)
        ax.plot(hull_points[:, 0], hull_points[:, 1], 'r-', lw=2)
    
    ax.set_title(label, fontsize=10)
    ax.axis('equal')
    ax.axis('off')

# Remove any unused subplots.
for j in range(i + 1, len(axes)):
    axes[j].remove()

plt.tight_layout()
plt.show()

Mesh Statistics and Summary

Table Summary

Code
import pandas as pd

metrics = ["n_vertices", "n_faces", "avg_edge_length", "std_edge_length"]
mean_vals = [
    np.mean(mesh_stats["vertices_list"]),
    np.mean(mesh_stats["faces_list"]),
    np.mean(mesh_stats["edge_length_list"]),
    np.mean(mesh_stats["edge_length_std_list"]),
]
std_vals = [
    np.std(mesh_stats["vertices_list"]),
    np.std(mesh_stats["faces_list"]),
    np.std(mesh_stats["edge_length_list"]),
    np.std(mesh_stats["edge_length_std_list"]),
]

if mesh_stats["normal_mag_list"]:
    metrics.append("avg_normal_magnitude")
    mean_vals.append(np.mean(mesh_stats["normal_mag_list"]))
    std_vals.append(np.std(mesh_stats["normal_mag_list"]))

summary_df = pd.DataFrame({
    "Metric": metrics,
    "Mean": mean_vals,
    "Std Dev": std_vals
})

summary_df.style.format({"Mean": "{:.2f}", "Std Dev": "{:.2f}"})
  Metric Mean Std Dev
0 n_vertices 269226.55 216488.00
1 n_faces 538485.99 432855.65
2 avg_edge_length 0.21 0.10
3 std_edge_length 0.09 0.03
4 avg_normal_magnitude 1.00 0.00

Results and Discussion

The experiments reveal the following key findings:

  • Visualization: A random sample of meshes was rendered, confirming that most STL files contain valid geometries suitable for further analysis.
  • Mesh Properties: Summary statistics indicate variations in vertex and face counts. The computed average edge lengths and their deviations provide insight into mesh resolution and potential irregularities.
  • Outlier Detection: Outliers based on vertex count and other metrics were identified, which may correspond to damaged or overly simplified meshes.
  • Duplicate Detection: Using normalized geometric features, several candidate duplicate meshes were flagged, suggesting potential redundancies in the dataset.
  • Extended Analysis: Additional properties (when available) further confirmed the overall consistency of the dataset while highlighting specific cases for further investigation.

Findings and further steps of the analysis for the preprocessing pipeline

  • todo: explain what all the results are and how we can use them to improve the preprocessing pipeline

Conclusion

This exploratory analysis of raw 3D meshes has provided a comprehensive overview of mesh quality, consistency, and potential anomalies. The combined visualization, property computation, and duplicate detection techniques offer a robust framework for preliminary data quality assessment. Future work may focus on refining these metrics and incorporating additional geometric and topological analyses for improved mesh validation and processing.